Skip to content

Conversation

@YanshekWoo
Copy link
Contributor

Checklist

  • My model has a model sheet, report or similar: KaLM-Embedding
  • My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
  • The results submitted is obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not on the evaluation dataset including training splits. If I have I have disclosed it clearly.

@github-actions
Copy link

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: KaLM-Team/KaLM-Embedding-X-0605
Tasks: AILAStatutes, AfriSentiClassification, AlloProfClusteringS2S.v2, AlloprofReranking, AmazonCounterfactualClassification, ArXivHierarchicalClusteringP2P, ArXivHierarchicalClusteringS2S, ArguAna, ArmenianParaphrasePC, BUCC.v2, BelebeleRetrieval, BibleNLPBitextMining, BigPatentClustering.v2, BiorxivClusteringP2P.v2, BornholmBitextMining, BrazilianToxicTweetsClassification, BulgarianStoreReviewSentimentClassfication, CEDRClassification, CLSClusteringP2P.v2, CSFDSKMovieReviewSentimentClassification, CTKFactsNLI, CataloniaTweetClassification, Core17InstructionRetrieval, CovidRetrieval, CyrillicTurkicLangClassification, CzechProductReviewSentimentClassification, DBpediaClassification, DalajClassification, DiaBlaBitextMining, EstonianValenceClassification, FaroeseSTS, FilipinoShopeeReviewsClassification, FinParaSTS, FinancialPhrasebankClassification, FloresBitextMining, GermanSTSBenchmark, GreekLegalCodeClassification, GujaratiNewsClassification, HALClusteringS2S.v2, HagridRetrieval, IN22GenBitextMining, IndicCrosslingualSTS, IndicGenBenchFloresBitextMining, IndicLangClassification, IndonesianIdClickbaitClassification, IsiZuluNewsClassification, ItaCaseholdClassification, JSICK, KorHateSpeechMLClassification, KorSarcasmClassification, KurdishSentimentClassification, LEMBPasskeyRetrieval, LegalBenchCorporateLobbying, MIRACLRetrievalHardNegatives, MLQARetrieval, MacedonianTweetSentimentClassification, MalteseNewsClassification, MasakhaNEWSClassification, MasakhaNEWSClusteringS2S, MassiveIntentClassification, MedrxivClusteringP2P.v2, MultiEURLEXMultilabelClassification, MultiHateClassification, NTREXBitextMining, NepaliNewsClassification, News21InstructionRetrieval, NollySentiBitextMining, NordicLangClassification, NorwegianCourtsBitextMining, NusaParagraphEmotionClassification, NusaTranslationBitextMining, NusaX-senti, NusaXBitextMining, OdiaNewsClassification, OpusparcusPC, PAC, PawsXPairClassification, PlscClusteringP2P.v2, PoemSentimentClassification, PolEmo2.0-OUT, PpcPC, PunjabiNewsClassification, RTE3, Robust04InstructionRetrieval, RomaniBibleClustering, RuBQReranking, SCIDOCS, SIB200ClusteringS2S, SICK-R, STS12, STS13, STS14, STS15, STS17, STS22.v2, STSB, STSBenchmark, STSES, ScalaClassification, SemRel24STS, SentimentAnalysisHindi, SinhalaNewsClassification, SiswatiNewsClassification, SlovakMovieReviewSentimentClassification, SpartQA, SprintDuplicateQuestions, StackExchangeClustering.v2, StackOverflowQA, StatcanDialogueDatasetRetrieval, SwahiliNewsClassification, SwednClusteringP2P, SwissJudgementClassification, T2Reranking, TERRa, TRECCOVID, Tatoeba, TempReasonL1, ToxicConversationsClassification, TswanaNewsClassification, TweetTopicSingleClassification, TwitterHjerneRetrieval, TwitterURLCorpus, VoyageMMarcoReranking, WebLINXCandidatesReranking, WikiCitiesClustering, WikiClusteringP2P.v2, WikipediaRerankingMultilingual, WikipediaRetrievalMultilingual, WinoGrande, XNLI, indonli

Results for KaLM-Team/KaLM-Embedding-X-0605

task_name KaLM-Team/KaLM-Embedding-X-0605 google/gemini-embedding-001 intfloat/multilingual-e5-large Max result
AILAStatutes 0.47 0.49 0.21 0.01
AfriSentiClassification 0.54 0.54 0.46 0.01
AlloProfClusteringS2S.v2 0.59 0.56 0.33 0.01
AlloprofReranking 0.81 0.82 0.69 0.01
AmazonCounterfactualClassification 0.88 0.88 0.70 0.92
ArXivHierarchicalClusteringP2P 0.64 0.65 0.56 0.01
ArXivHierarchicalClusteringS2S 0.64 0.64 0.54 0.01
ArguAna 0.58 0.86 0.54 0.64
ArmenianParaphrasePC 0.96 0.97 0.95 0.01
BUCC.v2 0.99 0.99 0.99 0.01
BelebeleRetrieval 0.85 0.91 0.78 0.01
BibleNLPBitextMining 0.20 0.21 0.17 0.01
BigPatentClustering.v2 0.46 0.38 0.31 0.00
BiorxivClusteringP2P.v2 0.49 0.54 0.37 0.01
BornholmBitextMining 0.52 0.52 0.44 0.01
BrazilianToxicTweetsClassification 0.24 0.28 0.21 0.00
BulgarianStoreReviewSentimentClassfication 0.80 0.78 0.64 0.01
CEDRClassification 0.43 0.57 0.45 0.01
CLSClusteringP2P.v2 0.47 0.43 0.40 0.01
CSFDSKMovieReviewSentimentClassification 0.57 0.49 0.35 0.01
CTKFactsNLI 0.83 0.88 0.80 0.01
CataloniaTweetClassification 0.48 0.55 0.50 0.01
Core17InstructionRetrieval 0.05 0.08 -0.02 0.00
CovidRetrieval 0.82 0.79 0.76 0.01
CyrillicTurkicLangClassification 0.81 0.95 0.41 0.01
CzechProductReviewSentimentClassification 0.67 0.68 0.57 0.01
DBpediaClassification 0.95 0.95 0.88 0.01
DalajClassification 0.50 0.50 0.50 0.01
DiaBlaBitextMining 0.88 0.87 0.85 0.01
EstonianValenceClassification 0.68 0.54 0.43 0.01
FaroeseSTS 0.80 0.86 0.72 0.01
FilipinoShopeeReviewsClassification 0.50 0.48 0.35 0.01
FinParaSTS 0.26 0.29 0.25 0.00
FinancialPhrasebankClassification 0.94 0.89 0.84 0.01
FloresBitextMining 0.78 0.84 0.81 0.01
GermanSTSBenchmark 0.85 0.88 0.84 0.01
GreekLegalCodeClassification 0.42 0.44 0.37 0.01
GujaratiNewsClassification 0.92 0.92 0.77 0.01
HALClusteringS2S.v2 0.32 0.32 0.23 0.00
HagridRetrieval 0.99 0.99 0.99 0.01
IN22GenBitextMining 0.92 0.94 0.77 0.01
IndicCrosslingualSTS 0.53 0.63 0.44 0.01
IndicGenBenchFloresBitextMining 0.96 0.97 0.89 0.01
IndicLangClassification 0.86 0.88 0.20 0.01
IndonesianIdClickbaitClassification 0.64 0.67 0.61 0.01
IsiZuluNewsClassification 0.33 0.41 0.32 0.00
ItaCaseholdClassification 0.68 0.73 0.67 0.01
JSICK 0.85 0.85 0.80 0.01
KorHateSpeechMLClassification 0.22 0.18 0.10 0.00
KorSarcasmClassification 0.66 0.61 0.57 0.01
KurdishSentimentClassification 0.83 0.86 0.77 0.01
LEMBPasskeyRetrieval 0.39 0.39 0.38 0.01
LegalBenchCorporateLobbying 0.94 0.96 0.90 0.01
MIRACLRetrievalHardNegatives 0.61 0.70 0.67 0.01
MLQARetrieval 0.81 0.84 0.76 0.01
MacedonianTweetSentimentClassification 0.72 0.72 0.62 0.01
MalteseNewsClassification 0.47 0.37 0.24 0.00
MasakhaNEWSClassification 0.84 0.84 0.78 0.01
MasakhaNEWSClusteringS2S 0.64 0.57 0.38 0.01
MassiveIntentClassification 0.77 0.82 0.60 0.85
MedrxivClusteringP2P.v2 0.40 0.47 0.34 0.01
MultiEURLEXMultilabelClassification 0.05 0.05 0.05 0.00
MultiHateClassification 0.83 0.72 0.64 0.01
NTREXBitextMining 0.90 0.94 0.91 0.01
NepaliNewsClassification 0.98 0.98 0.88 0.01
News21InstructionRetrieval 0.02 0.10 -0.00 0.00
NollySentiBitextMining 0.58 0.69 0.67 0.01
NordicLangClassification 0.72 0.86 0.80 0.01
NorwegianCourtsBitextMining 0.93 0.93 0.94 0.01
NusaParagraphEmotionClassification 0.47 0.56 0.42 0.01
NusaTranslationBitextMining 0.71 0.78 0.67 0.01
NusaX-senti 0.78 0.80 0.71 0.01
NusaXBitextMining 0.83 0.83 0.73 0.01
OdiaNewsClassification 0.85 0.92 0.80 0.01
OpusparcusPC 0.96 0.97 0.95 0.01
PAC 0.70 0.72 0.70 0.01
PawsXPairClassification 0.60 0.60 0.55 0.01
PlscClusteringP2P.v2 0.75 0.74 0.72 0.01
PoemSentimentClassification 0.72 0.60 0.51 0.01
PolEmo2.0-OUT 0.76 0.78 0.36 0.01
PpcPC 0.94 0.96 0.92 0.01
PunjabiNewsClassification 0.84 0.83 0.81 0.01
RTE3 0.89 0.90 0.88 0.01
Robust04InstructionRetrieval -0.01 -0.02 -0.07 0.00
RomaniBibleClustering 0.43 0.43 0.41 0.00
RuBQReranking 0.76 0.74 0.76 0.01
SCIDOCS 0.22 0.25 0.17 0.25
SIB200ClusteringS2S 0.45 0.42 0.24 0.00
SICK-R 0.81 0.83 0.80 0.82
STS12 0.81 0.82 0.80 0.80
STS13 0.86 0.90 0.82 0.89
STS14 0.82 0.85 0.78 0.85
STS15 0.88 0.90 0.89 0.89
STS17 0.83 0.89 0.82 0.91
STS22.v2 0.72 0.72 0.64 0.01
STSB 0.82 0.85 0.82 0.01
STSBenchmark 0.86 0.89 0.87 0.88
STSES 0.77 0.82 0.80 0.01
ScalaClassification 0.51 0.52 0.52 0.01
SemRel24STS 0.68 0.73 0.63 0.01
SentimentAnalysisHindi 0.79 0.76 0.64 0.01
SinhalaNewsClassification 0.80 0.82 0.67 0.01
SiswatiNewsClassification 0.52 0.62 0.54 0.01
SlovakMovieReviewSentimentClassification 0.94 0.90 0.74 0.01
SpartQA 0.10 0.10 0.06 0.00
SprintDuplicateQuestions 0.95 0.97 0.93 0.96
StackExchangeClustering.v2 0.66 0.92 0.46 0.01
StackOverflowQA 0.93 0.97 0.89 0.01
StatcanDialogueDatasetRetrieval 0.47 0.51 0.11 0.01
SwahiliNewsClassification 0.65 0.66 0.60 0.01
SwednClusteringP2P 0.49 0.46 0.37 0.01
SwissJudgementClassification 0.55 0.58 0.54 0.01
T2Reranking 0.67 0.68 0.66 0.01
TERRa 0.57 0.64 0.58 0.01
TRECCOVID 0.85 0.86 0.71 0.85
Tatoeba 0.81 0.82 0.76 0.01
TempReasonL1 0.02 0.03 0.01 0.00
ToxicConversationsClassification 0.86 0.89 0.66 0.87
TswanaNewsClassification 0.51 0.53 0.47 0.01
TweetTopicSingleClassification 0.78 0.71 0.65 0.01
TwitterHjerneRetrieval 0.79 0.98 0.35 0.01
TwitterURLCorpus 0.87 0.87 0.86 0.87
VoyageMMarcoReranking 0.71 0.67 0.68 0.01
WebLINXCandidatesReranking 0.16 0.11 0.08 0.00
WikiCitiesClustering 0.94 0.92 0.76 0.01
WikiClusteringP2P.v2 0.32 0.28 0.26 0.00
WikipediaRerankingMultilingual 0.89 0.92 0.89 0.01
WikipediaRetrievalMultilingual 0.90 0.94 0.90 0.01
WinoGrande 0.47 0.61 0.55 0.01
XNLI 0.82 0.85 0.75 0.01
indonli 0.58 0.61 0.52 0.01
Average 0.66 0.68 0.59 0.10

@KennethEnevoldsen KennethEnevoldsen added the waiting for review of implementation This PR is waiting for an implementation review before merging the results. label Jun 25, 2025
@YanshekWoo
Copy link
Contributor Author

@KennethEnevoldsen
The Model Card for the implementation has just been merged into the MTEB. What is the status of progress here?
Is there anything that I need to do?

@Samoed Samoed merged commit f050a02 into embeddings-benchmark:main Jul 3, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting for review of implementation This PR is waiting for an implementation review before merging the results.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants